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(54) System for analyzing movement patterns 

(57) A system and method are provided for detect- 
ing human movement patterns in and around a selected 
area using imaging techniques. A video camera 110 is 
positioned so as to view the area of interest (410), such 
as a promotional display, an automated teller machine 
(ATM), etc. The output (115) of the video camera is fed 
in real-time into a frame grabber (120), where the video 
image, is repeatedly and sequentially digitized and 
stored in the memory (135) of a processing system 
(140). One or more passing zones and looking zones 



are defined for the video image. A passing zone (430) 
is defined as a zone of the video image through which 
a person would be located if "passing by" the area of 
interest a looking zone (420) is defined as a zone of the 
video image where a person would be located if "look- 
ing" at the area of interest and is often smaller in area 
than the passing zone; the processing system gener- 
ates passing and/or looking events. Data corresponding 
to the passing, and looking events may be stored for 
further processing and/or analysis. 
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Description 



The present invention generally relates to a system and method for detecting and analyzing movement patterns 

of humans in and around selected areas using imaging techniques. In particular, the present invention relates to a 
5 system and method for detecting and analyzing and quantifying movement activities, such as passing and looking and 

dwell time, of customers tn and around product displays, customer service areas, automated teller machines, interactive 

kiosks, and other bounded areas in which customers or other people may be present. 

It has become commonplace for retailers, manufacturers, and other organizations to rely on localized promotional 

displays in order to promote the sale of their products. For example, a supermarket may place a free-standing or wall- 
10 mounted promotional display on the end of an aisle, or in some other suitable location, in order to promote the sale of 

a product, such as a soft drink. Promotional displays are generally designed to draw attention from passing customers, . 

so that the customers will hopefully stop, look at the display, and decide to purchase the product or service advertised 

thereon. Such promotional displays, when working properly, can therefore aid in the marketing of the underlying product 

or service. 

15 While serving a somewhat different purpose than promotional displays, it has also become commonplace for banks 

and other institutions to utilize localized automated teller machines (ATMs) and interactive computer kiosks to allow 
customers to perform on-line transactions. ATMs and kiosks are similar to promotional displays in that they are often 
designed to attract passing customers. Because some ATMs charge customers a usage fee, the ability to promote 
usage can lead to increased revenue to the banks. 

20 Unfortunately, it' is usually difficult to precisely determine how effective a promotional display, ATM. etc. is in at- 

tracting customers. For the promotional display, a retailer may compare the revenue associated with a product before 
and after the installation of the promotional display, but this may not be an accurate indication of the effectiveness of- 
the display. For example, many factors can contribute to the level of sales of a product, only one of which is a promotional 
■ display. For an ATM, it is possible for a bank to determine the total number of users of the ATM, and the type of 

25 transactions performed, but it is usually difficult to determine the percentage of passing people who stop to use the 
machine. Whether its a promotional display, an ATM, etc., the ability to determine how many people stop versus how 
many don't stop can be quite valuable information. 

The ability to detect and analyze "passing" and "looking" habits of people can prove to be useful not just to com- 
mercial organizations such as retailers, manufacturers and banks, but also in any other situation involving a localized 

30 area of interest. For example, a rapid transit authority might be interested whether people are stopping to look at a bus 
schedule posted on the wall of the station. Changes to the way the schedule is organized and presented can be made 
based upon the passing and looking habits of people passing by. 

It is the object of the invention to provide a system and method for detecting and analyzing movement patterns, 
such as passing, looking and dwell time, of people in and around a selected area. 

35 According to the invention a process for analyzing movement patterns in a localized area, characterizing the steps 

of: 

positioning a video camera so as to view the localized area; 
capturing sequential images from the video camera at selected intervals in time: 
40 storing the sequential images in a memory: 

detecting whether a person exists within the stored sequential images: 
sensing the time that any detected person exists in the stored images: and 
generating an event once the sensed time reaches a predetermined threshold. 

45 A system and method are provided for detecting human movement patterns in and around a selected area using 

imaging techniques. A video camera may be positioned so as to view the area of interest, such as a promotional display, 
an automated teller machine (ATM), etc. The output of the video camera is fed in real-time into a frame grabber, where 
the video image is repeatedly and sequentially digitized and stored in the memory of a processing system. One or 
more passing zones and looking zones are defined for the video image. A passing zone is defined as a zone of the 

so video image through which a person would be located if "passing" by the selected zone (such as by a promotional 
display. ATM, etc.). A looking zone if defined as a zone of the video image where a person would be located if "looking" 
at the selected zone. As people pass by the selected zone and/or look in the selected zone, the processing system 
generates passing and/or looking events. While people remain within the looking zone, the processing system may 
also monitor the system clock in order to measure how long a person looks at the product, ATM, etc. Dwell time ; which 

55 measures how long each person spends, is another important result produced by this system. Data corresponding to 
the passing, looking and dwell time events may be stored for further processing and/or analysis. 

The invention will now be described by way of example with reference to the accompanying drawings in which:- 
FIG. 1 is a block diagram of the present invention in one embodiment. 
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FIG. 2 is a flow diagram depicting the overall steps that are performed by the present invention. 
FIG. 3 is hierarchical tree that depicts a data structure that may be used in accordance with the present invention. 
FIGS. 4A and 4B are simplified versions of a camera view that may be processed by the present invention. 
FIG. 5 is a flow diagram depicting the overall steps that may be performed by the present invention to implement 
5 . looking zone processing. 

FIG. 6 is a flow diagram depicting the overall steps that may be performed by the present invention to implement 
passing zone processing. 

FIG. 7 is a flow diagram depicting the overall steps that may be performed by the present invention to implement 
object filtering. 

io FIG. 8 is a flow diagram depicting the overall steps that may be performed by the present invention to determine 

if an object has passed through a passing zone. 

FIG. 9 is a flow diagram depicting the overall steps that may be performed by the present invention to implement 
background updating. 

FIG. 10 is a flow diagram depicting the steps that may be performed by the present invention to implement object 
15 matching. 

The system of the present invention operates to process a sequence of input images received from a camera 
monitoring a promotional display, an ATM, or other area of interest in order to collect data that is useful in analyzing 
customer interaction with the display, etc.. This analysis may provide the following type of information: 

20 • number of people (for example, customers) that pass the display, etc. 

• number of people that stop and look at the display, etc. 

• length of time each person looked at the display, etc. 

For each camera view, "passing zones" and "looking zones" are defined by the user. Passing zones are used to 

25 determine the number of people that walk past the promotional display and looking zones are used to determine the 
number of people that look at the display. 

FIG. 1 depicts the overall structure of the present invention in one embodiment. The hardware components of the 
present invention may consist of standard off-the-shelf components. The primary components in the system are one 
or more video cameras 110. one or more frame grabbers 120, and a processing system 130, such as a personal 

30 computer (PC). The combination of the PC 130 and frame grabber 120 may collectively be referred to as a "video 
processor" 140. The video processor 140 receives a standard video signal format 115, such asRS-170, NTSC, CCIR; 
PAL, from one or more of the cameras 110, which can be monochrome or color. The camera(s) is/are positioned to 
view a selected area of interest, such as a promotional display in a retail establishment, an automated teller machine 
(ATM), or any other localized area where people (or even other objects) pass by : and optionally stop in order to look. 

35 The video signal 1 1 5 is input to the frame grabber 1 20. In one embodiment, the frame grabber 1 20 may comprise 

a Meteor Color Frame Grabber, available from Matrox. The frame grabber 120 operates to convert the analog video 
signal 115 into a digital image stored within the memory 135. which can be processed by the video processor 140. For 
example, in one implementation, the frame grabber 120 may convert the video signal 115 into a 640 x 480 (NTSC) or 
768 x 576 (PAL) gray level image. Each pixel may have 8 bits of resolution. -- 8 bits of resolution is usually sufficient, 

40 and color data could be used to increase system performance. Of course, a variety of other digital image formats and 
resolutions may be used as well, as will be recognized by one of ordinary skill. 

At this time: analysis of the image begins by the PC 130, as will be described in further detail below 
The high level control flow of the present invention, performed by the processing system 130, is described below, 
both in pseudo-code and in a more detailed format. Reference is made in [brackets] in the pseudo-code to the steps 

•*s illustrated in FIG. 2: 

(201) initialize system parameters 

[202) configure video streams 
[203] while ( true )- 

(204] for (j = 0: j < total number of video inputs: j++ ) 
50 [205] get next frame from video input j 

[206| for ( i=0; i < total number of displays: i++ ) 
[207] process looking zones for display i 
[208] process passing zones for display i 
[208a| end for 
55 ■ [209] end for 

[210] end while 

In step 201 , the system parameters are initialized. System parameters can be viewed as a tree structure, as shown 
in FIG. 3. The root 310 ofthe tree is "video inputs", which is the highest level data structure representing the video 
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signal(s) 115 The nexl level 320 ofthe tree contains one or more nodes 321 , each corresponding to one of the video 
signals 1 1 5 - for each video signal 1 1 5, there is one node 321 Each node 321 includes as a parameter a frame grabber 
identifier 322 which identifies the frame grabber 120 assigned to the corresponding video signal 115. 

For each video signal 1 1 5. the present invention may be implemented to support the analysis of one or more areas 
of interest within the view of the camera 110. For example, the camera 110 may be aimed at two promotional displays 
within a retail establishment, wherein each of the two displays contains different products Thus, there may be two 
areas of interest with.n the video image of each v,deo signal 115. This is represented within the system parameter tree 
structure of FIG 3 as yet another level 330, containing one or more nodes 331 Each node 331 is a child of one of the 
nodes 321 of level 320 and each node 331 corresponds to one area of interest (e.g., one promotional display, one 
ATM etc ) Associated with each node 331 may be a display identifier 332 which uniquely identifies the particular 
display, as well as a "looking time" parameter 333 which identifies the minimum amount of time that must pass for a 
looking event to be logged (described below). 

FIG 4A is a depiction of a video display capturing an image from the video camera 110, stored in memory 135, of 
a promotional display 410. with the looking zone 420 and passing zone 430 defined Likewise, FIG. 4B is a depiction 
of a video display where there are two displays 410, two looking zones 420 and two passing zones 430. Within each 
camera view, multiple passing zones 430 can be defined. 

A rectangular passing zone 430 is specified. As people move through the passing zone 430, the system generates 
events "passing events", that indicate that a person has passed through the passing zone 430. 

In order to determine the number of people thai actually look at a promotional display, etc. 410, it is necessary to 
determine when a person enters the looking zone 420 and capture the amount of time the person remains in the looking 
zone 420. If a person remains in the looking zone 420 a minimum amount of time, a looking event is generated 

A looking zone 420 is processed differently to a passing zone 430. 

The output of the system of the present invention is a set of events that represent persons passing the display 410 
and persons looking at the display 41 0 There is a one-to-one correspondence between an event and a person passing 
through the passing zone 430. Each time a person remains in the looking zone 420 longer than a predetermined time 
period, a looking event is generated TABLE 5 (see below) illustrates an exemplary format of a passing event, while 
TABLE 4 illustrates an exemplary format of a looking event. 

A passing event may require approximately 12 bytes of data storage, whereas a looking event may require ap- 
proximately 16 bytes. 

TABLE 1 contains the data storage requirements for a typical configuration monitoring one promotional display 
410 It is assumed that twenty percent of passing people actually stop and look at the display 410: 

TABLE 1 



Total Passing 
Customers 


Total Looking 
Customers 


Bytes/Passing Event 


Bytes/Looking Event 


Storage 
Requirements 


500 


100 


12 


16 


7.6K 


1000 


200 


. 12 


16 


15.2K - 


3000 


600 


12 


16 


45.6K 



45 



Referring again to FIG. 3. associated with each node 331 (which correspond to a display ATM. etc.) is yet another 
level 340 of the tree structure. Level 340 contains one or more looking zone nodes 341 and passing zone nodes 342, 
each pair of which may be associated with one node 331 . Thus, each looking zone 420 and passing zone 430 defined 
for an individual display 331 may use the same display identifier 332. 

The following parameters may exist for each looking zone node. 341 : 

The bounding box describes the location of the looking zone 420 within the camera 110 
view. The location information contains the x : y coordinates of the top left corner of the 
looking zone 420 and lower right corner of the looking zone 420 in the image coordinate 
system, as illustrated in FIG. 4. 

The percent occupied parameter specifies the number of pixels that must be "On" for the 
looking zone 420 to be considered occupied. This is described in further detail below with 
respect to FIG. 5. 

Specifies the difference threshold used when performing the image difference operation. 
This is also described in further detail below with respect to FIGS. 5 and 9. 



Bounding Box : 



Percent Occupied: 



■ 5S Difference Threshold : 



The following parameters may also exist for each looking zone node 341 . Each of these parameters are also described 
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in further detail below with respect to FIG. 9. 

Previous Threshold: 
Count Threshold: 
5 Percent Difference Threshold: 

Background Image: 

The following parameters exist for each passing zone node 342. 

w Bounding Box : The bounding box describes the location of the passing zone 430 within the camera 110 

view. The location information contains the x,y coordinates of the top left corner of the 
passing zone 430 and lower right corner of the passing zone 430 in the image coordinate 
system. See FIG. 4. 

Difference Threshold : Specifies the difference threshold used when performing the image difference operation. 

t5 This is described in further detail below with respect to FIGS. 6 and 9. 

Adjacency Distance: Specifies the adjacency constraint for region labelling. This is described elsewhere with 

respect to FIG. 6. 

Minimum Object Size : Specifies the minimum object size, in pixels, used in the filtering operation. This is de- 

scribed elsewhere with respect to FIG. 7. 
20 Maximum Distance : Specifies the maximum distance, in pixels, that two objects are allowed when matching. 

This is described elsewhere with respect to FIG. 10. 

Path Length Threshold: Specifies the minimum length an object track must exceed in order to be counted. This 

is described elsewhere with respect to FIG. 8. 

25 The following parameters may also exist for each passing zone node 342. Each of these parameters are also described 
in further detail below with respect to FIG. 9. 

Previous Threshold: 
Count Threshold: 
30 Percent Difference Threshold: 

Background Image: 

After initialization in step 201 , the video streams are configured in step 202. In order to optimize system resources, 
CPU 1 30 and memory 1 35, it is only necessary to capture and store the portion of the image that will be analyzed by 

35 the processing system 130. Again, one or more looking zones 420 (as defined in node 341) and passing zones 430 
(as defined in node 342) are defined for the promotional display, etc. 410 being monitored. In one embodiment, these 
zones 420, 430 are the only portion of the image that are analyzed by the present invention. The largest window 440 
that encloses all looking zones 420 and passing zones 430 is called the region of interest (ROI). In this embodiment, 
when the present invention is operating, only the ROI 440 need be captured and stored in memory 135. Since the 

-to present invention may support multiple cameras, each video input stream 115 may be configured independently, based 
on the ROI 440 for each camera 110 view. 

At step 203 (FIG. 2), the present invention begins to process the video input streams 115. The processing system 
1 30 processes each video stream 1 1 5 sequentially -- for each video input 1 1 5, the system 1 30 obtains the next frame 
(digital image) (steps 204-209). The system 130 sequentially analyzes each display 410 that is configured within the 
camera 110 view -- for each display 410, the system 130 analyzes the set of looking zones 420 and passing zones 
430 associated with the display 41 0 ; as described in further detail below. 

Looking Zone Processing 

so in step 207, the looking zones 420 are processed. Below is the pseudo code for the looking zone process which 

is performed by the processing system 1 30 using the video image stored in memory 1 35. Reference is made in [brack- 
ets] in the pseudo-code to the steps of FIG. 5; 

[501 1 For ( i - 0: i < number of looking zones: i+-t- ) 
[502| extract the region of interest for zone i 
55 [503] generate the background image difference 

[504] if (pixels on > percentOccupied) 

[505] status[i) = OCCUPIED 
[506] else 
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[507] status[i] = NOTJDCCUPIED 
[508] end if 

[509] update current background 
[510| end for 

5 (511 1 -if ( displayStatus = NOTJDCCUPIED && getStatus() = OCCUPIED) 

[51 2] startTime = current time 

[513] displayStatus = OCCUPIED 
[514] else if ( displayStatus = OCCUPIED && getStatus() = NOTJDCCUPIED) 

[515] if ( (current time - startTime) > lookingTime ) 
w [516] save looking event 

[517] end if 

[518] displayStatus = NOTJDCCUPIED 
[519] end if 

Again, each display 410 being monitored has one or more looking zones 420 associated with the display 410. If 
is a person is detected in any one of the looking zones 420, the display 410 is considered occupied or in use. Initially, 
the status of each looking zone 420 (as defined in the data structure of FIG. 3), statusp], and the displays are set to 
NOTJDCCUPIED. 

For each video signal input 115, the following is performed. In steps 501-510, the processing system 1 30 sequen- 
tially processes each looking zone 420 (utilizing the data structure of FIG. 3), performing the following functions. First, 

20 in step 502, the region of interest (ROI) 420 is extracted from the input frame 440 for the corresponding looking zone 
420. Again, the input frame 440 corresponds to a bounding box that encloses the one or more looking zones 420 or 
passing zones 430. The output of the operation is a smaller frame that contains only the region of the image that 
corresponds to the looking zone 420 being analyzed. 

Next, in step 503. an image difference is generated using the extracted frame and the current background asso- 

25 ciated with the looking zone 420. The image difference is used to detect objects in the looking zone 420 that are not 
in the background. Since new objects in. the scene generally exhibit a different set of gray scale intensities or colors 
(depending upon whether gray scaling or color is used) than the background, new objects in the scene can be detected. 
The following equation may be used in generating the image difference for a looking zone 420 of width n and height m: 

30 

n m 

ZZ D(x,y) = f(I(x,y), B(x,y)) 



35 



x=0v=0 



where 



40 



f(i,b) = l,if|i-b| D 
=0, if|i-b|<D 



and where D is the difference threshold. 
45 Pixels in the image difference with a value of 1 represent locations within the image where objects have been 

detected. The threshold value selected for the image difference operation must be large enough to eliminate noise 
resulting from shadows, changing lighting conditions, and other environmental factors. However, the threshold must 
be small enough to detect the presence of people within the scene. 

. To determine if the looking zone 420 is occupied, in steps 504-508 the processing system 1 30 compares the total 
50 number pixels in the image difference with a value of 1 to the percent„occupied parameter The percent_occupied 
parameter is the total number of pixels that must be different between the current frame and background frame for the 
' looking zone to be considered occupied. The status of the looking zone 420 is updated, and finally, the current looking 
zone background is updated in step 509, as explained in further detail below. 

After each looking zone is processed, the system checks for a change in the status of the display in steps 511 -519. 
55 in step 51 1 . if any looking zone is occupied and the status of the display was NOT_OCCUPIED, a person has entered 
one of the looking zone. In steps 512 and 513. the status ofthe display is set to OCCUPIED, and the time at which the 
display became occupied is stored in startTime. 

In step 514. if the display is not occupied, the system must determine if a person has left the display If the display 
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to 



20 



25 



30 



status is OCCUPIED and the status of each zone is NOTJDCCUPIED, a person left the display. In steps 515-518, if 
the person remained at the display for greater than lookingTime seconds, the system logs a looking event. If the person 
was only present for a short time period, the event is not logged. 

Passing Zone Processing 

In step 208, the passing zones 430 are processed. Below is the pseudo code for the passing zone process which 
is performed by the processing system 1 30 using the video image stored in memory 1 35. Reference is made in [brack- 
ets] in the pseudo-code to the steps of FIG. 6. 

[601 1 For ( i = 0: i < number of passing zones: i++ ) 

[602] extract the region of interest for passing zone i 
[603| generate the background image difference 
[604] perform region labeling 
[605] filter objects 
[606] track objects 
[607] update passing count 
[608] update current background 
[609] end for 

Each display 410 being monitored has one or more passing zones 430 associated with. the display 410. If a person 
is detected as walking through one of the passing zones 430, the passing count for the display 410 is updated. 

For each frame input 115, the following is performed In steps 601-609, the processing system 130 sequentially 
processes each passing zone 430 performing the following functions. First, in step 602, the region of interest 430 is 
extracted from the input frame 440 for the corresponding passing zone 430. Again, the input frame 440 corresponds 
to a bounding box that encloses the one or more passing zones 430 and looking zones 420. The output of the operation 
is a smaller frame that contains only the region of the image that corresponds to the passing zone 430 being analyzed. 

Next, in step 603 an image difference is generated using the extracted frame and the current background associ- 
ated with the passing zone 430. The image difference is used to detect objects in the passing zone 430 that are not 
in the background. Since new objects in the scene generally exhibit a different set of gray scale intensities than the 
background, new objects in the scene can be detected. The same method is used as in. looking zone background 
differencing, previously described with respect to FIG. 5. 

At this point, a binary image exists in memory 1 35 with values of 0 and 1 . A pixel value of 1 indicates the presence 
of an object that is different than the background. Next, in step 604 a region labeling operation is performed on the 
image difference in order to group adjacent pixels that are "On" in the image difference. For example, assume a portion 
of the binary image contains the following pixel values (as shown in TABLE 2). 
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0 


0 


0 


0 


0 


0 


0 



1 

0 



0 
0 
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TABLE 2 



55 



The region labeling of slep 604 may identify two regions of adjacent pixels, A and B. These regions are shown 
below (TABLE 3): 
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0 


0 


0 


0 
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0 


0 


0 


0 


0 



TABLE 3 



Regions A and B correspond to two objects that have been identified in the passing zone 430. Since noise and 
20 poor segmentation of an object with similar gray level intensity to the background can result in less than optimal seg- 
mentation it is necessary to relax the adjacency constraint. Adjacency can be defined as a distance function, adja- 
cencyDistance. being expressed «n terms of pixel distance The set of pixels ad|acent to pixel P M with a distance D are 

given below. 



D D 



XX Pi+k, 



k=-D l=-D 



j+1 



In the preferred embodiment, distance is a system parameter. In the example above, with a distance of 2, regions A 
and B would be grouped as a single object. ^ - f ■ M 

The output of the region labeling algorithm is a set of ob]ect descriptors, one for each object or region identified 
35 in the image Each object descriptor consists of the object centroid. the x.y coordinate for the center of the object, and 
the size of the object in pixels. It is also possible to extract other object features, such as gray scale intensity color 
features etc in order to improve the performance of the object matching algorithm described later. 

In step 605 after objects have been identified in the current image, a filtering operation is performed. Objects that 
are too small should be discarded, since these likely correspond to objects resulting from noise or other segmentation 
40 related problems. The filtering causes objects that are to small to. be a person (e.g.. smaller than m.nObjectS.ze) to be 
discarded. Object filtering is described by the pseudo-code below, with reference to FIG. 7. 
[701 ] for each object 

[702] if ( object size < minObjectStze) 
[703) discard object 
45 [704] end if 

[705]- end for . 
At this point the processing system 130 has a set of objects that should correspond to people within the scene. 
In step 606 the system 1 30 now must match objects in the current frame with objects in the previous frame. TrackOb- 
jects is an array of objects that are currently being tracked. An entry in the TrackObjects array may consist of 



the associated object descriptor from the previous frame, and 
initial object location when first identified 

NewObjects is an array of object descr.ptors for the current frame. The sub-steps performed within step 606 to 
55 perform object matching are described as pseudo-code below, with reference to FIG. 10. 
[1001 ] if ( number of NewObjects '= 0 ) 
[1 002] delete all TrackObjects 
[1003] return 
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[1004| end if 

[1006| for ( k = 0: k < number of TrackObjects: k++ ) 

[1007] trackStatus[k| = NOTJV/IATCHED 
[1009) for ( k = 0: k < number of NewObjects; k++ ) 
5 [1010| objectSlatus(k) = NOT_MATCHED 

[ 101 2| for ( k = 0: k < number of TrackObjects; k++ ) 

[1013| bestD = MAXIMUM 

[1014| index = -1 

[1015] for ( j = 0: j < number of NewObjects: j++ ) 
to [1016| if (objectStatusfj] = NOTJvlATCHED ) 

[1017] d - distance(TrackObjects[k], NewObjectsfj]) 
f 1 018} if ( d < maxDistance AND d < bestD ) 
[1019] bestD = d 
[1020] index = j 
is [1021] end if 

[1022] end if 
[1023] end for 
[1025] if ( index != -1 ) 

[1026] objectStatusfindex] = MATCHED 
20 [1027] trackStatus[k] = MATCHED 

[1028] update TrackObjects[k] with NewObjects[index| 
[1029] end if 
[1030] end for 

[1032] for (k = 0; k < number of TrackObjects: k++ ) 
25 [1033] if ( trackStatus[k] = NOT_MATCHED ) 

[ 1 034] delete trackObjects[k] 
[1035| end if 
[1036] end for 

[1037] for (k = 0: k < number of NewObjects: k++ ) 
30 [1038] if ( objectStatus[k] = NOTJvlATCHED ) 

[1039] add newObjects[k] to trackObjects 
[1040] end if ' 
[1041] end for 

The following is a description of the above pseudo code with respect to FIG. 10. 
35 in step 1001, if the system did not identify any objects in the current frame, 'then all of the current TrackObjects 

are deleted in step 1002. and the routine is exited. Beginning at step 1006 ; if new objects were identified, the status 
of each new object and each existing track is set to NOTJvlATCHED in steps 1007 and 1010 respectively. The system 
now searches through the list of current objeclTracks attempting to find the best matching object identified in the current 
input frame. 

io The matching criteria used is the location of object centroids from the previous frame and current frame. The list 

of TrackObjects is traversed in step 1012. For each TrackObjects, the best matching NewObjects is located. In step 
101 3 the distance score, bestD, is set to the maximum value. The index of the best matching object is set to -1 in step 
1014. In step 10>5, the NewObjects list is traversed. Beginning at step 1016, for each entry in NewObjects that has 
not already been matched to a TrackObjects entry, the system calculates the distance score, d, in step 1017. In steps 

is 1018-1023, if d is less than the distance threshold, maxDistance, and less than the current bestD, a match is found. 
The index of the NewObjects object is saved and the bestD is updated. The distancef) function calculates the pixel 
distance from the centroid of the TrackObjects entry and the NewObjects entry. 

Beginning at step 1 025. if a matching object has been found, then the corresponding TrackObjects is updated with 
the current object descriptor information in steps 1026-1030. At step 1032, for each. TrackObjects not matched, the 

so object is deleted in steps 1033-1036. At step 1037, for each NewObjects not matched, a new TrackObjects is created 
in steps 1038-1041. 

The process described above with respect to FIG. 10 is a preferred embodiment. However improvements in this 
process may be made, as described below. 

In the description above, the distance function (step 1017) may be calculated by determining the distance from 
ss the centroid of each object. The notion of distance can be extended to include features other than object location. For 
example, other object features that can be exploited to increase accuracy are object size, gray scale intensity of an 
object, or color of an object. The distance function can be written as a weighted equation, as shown below 
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d = w1 



(location distance) + w2 • (size distance) + w3 * (intensity distance) + wN * (feature N) 



One approach being used for calculating the intensity distance is to extract a N x M region from around the < en tro d 
s of the object. This data is then stored as a data element in the object descriptor. The distance score - to, ^.ntens ty^ s 
then calculated by performing a normalized cross correlation function on the TrackObjects region and '^NewObjects 
region. This approach can be extend to include any object feature type that may be uselu ,n object 

A second approach is to extend the process of FIG. 10 to identify the best match 'or each TrackOb ects and 
NewObjects object pairing. Steps 1 01 2 - 1 025 locate the best matching NewObjects ob,ect to the TrackObjec. .objects 
,o Sequential order. A drawback to this approach ,s that ,f NewObjects entry X ,s the best match for ^kO^cts N 
and N + 1 the object is always matched with TrackObjects N. If the distance score for TrackObjects N + 1 is better then 
entry X should be matched to TrackObjects N+1 This anomaly in matching may also resu t in the system discarding 
or Seating a new TrackObjects entry for the object that shoutd have been match with TrackObjects N Th.s approach 
would increase algorithm accuracy, but would .ncrease the computational cost in calculating distance. 

Referring to FIG. 8. after the object tracks have been updated, the system must determ.ne ,f an object has passed 
through the passing zone 430 This is done by analyzing the length of the object track.. For each object tracMine , 80 V 
the system determines if the object has already been counted, line 802 If the object has not been counted he system 
compares the object track.length to the pathLeng.hThreshold. If the path length exceeds the threshold, Ine BOX the 
parsing count is updated in 804 The count flag for the object is set to true, line 805. so that the object will not be 
20 counted twice The pseudo-code for FIG. 8 appears below. 
[801 1 for (j - 1 : j < number of track objects: j++ ) 
[802] if ( TrackObjects[j].count = FALSE ) 

[8031 if ( pathLength(TrackObjects[j|) > pathLength Threshold ) 
[804| passingCount = passingCount + 1 
25 [805] TrackObjects[j] count = TRUE 

[806] end if 
[807] end if 

AftTJteps 801 -808 are completed, the background may be updated using the same method previously described 
30 for the looking zone 420 with respect to FIG 5. 
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Output 

In one embodiment, the output of the analysis system of the present invention may be an event stream that* de- 
scribes the interaction of people with a display 41 0 The present invention may be implemented to measure the following 
information 

the number oj people passing a display 
the number of people interacting with (looking at) a display 
• the amount of time each person interacts with (looks at) the display 

This information may be stored in a binary file called an event file, on hard disk or other storage media Event files 
are files created on each processing system 1 30 that is performing display analysis. The display analysis sys tern etc > es 
two types of events, or records, in even, files. The two types of events are looking events (corresponding to the looking 
zone 420) and passing events (corresponding to the passing zone 430). Ql „,^inr, 

When the processing system 1 30 detects a person in the looking zone 420 (as described previously), a new looking 
event is generated. The following table (TABLE 4) describes the data format of a customer looking event, ,n one em- 
bodiment: 



TABLE 4 



Field 


Function 


zone identifier 


identifies the display 


looking 


identifies the amount of time the person interacted with the display. 


Timestamp 


indicates the time of event creation 



Each time the processing system 1 30 detects that a person has passed through passing zone 430, a new passing 
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event is generated, in one embodiment, two or more customers passing the display at the same time are recorded in 
the same event. The following table (TABLE 5) describes the data format of a passing event: 



TABLE 5 



to 



Field 


Function 


zone identifier 


identifies the display 


count 


number of persons to pass 


Timestamp 


indicates the time of event creation 



Background Adaptation 



Over time, the scene that is being viewed by a camera 110 can change. These changes may result from changes 
/5 in lighting conditions, i.e. night and day, or changes in the physical location of objects within the camera 110 view. In 
order to detect people accurately within a camera 110 view, the present invention must be capable of adapting the 
current background to incorporate changes in. the scene. Furthermore, since looking zones 420 and passing zones 
430 can be located at different locations within the image, individual backgrounds may be maintained for each zone. 
The size of the background may be equal to the size of the zone. 
20 In a preferred embodiment, the method used to update looking zones 420 and passing zones 430 backgrounds 

is the same. A separate background is maintained for each looking and passing zone defined within the view of an 
individual camera 110. At initialization time, each looking and passing zone for an .individual display may be initialized 
by loading a background image file. 

The process described by the following pseudo-code may be performed by processing system 130 in updating 
25 the background, with reference to FIG. 9. 
[901] // Initialization 

[902] backgroundFrame = background frame 
[903] frameCount = 0 
[904] previousFrame = current frame 
30 [905) previousCount = 0 

[906) // End initialization 
[907]for(::) 

[908] get next frame 

[909] create_image_difference(current frame, previousFrame) 
35 [910) if ( percentage different > Percentdifference Threshold ) 

[911] previousFrameCount - 0 
[91 2| previousFrame '= current frame 
[913) frameCount = 0 
[914] else 

40 [915] previousFrameCount = previousFrameCount + 1 

[916] if ( previousFrameCount > previousThreshold ) 

[917] previousFrameCount = 0 

[918] previousFrame = current frame 
[919) end if 

4S, [920] frameCount - frameCount + 1 

[921 1 if ( frameCount > countThreshold ) 
[922) frameCount = 0 
[923] backgroundFrame = current frame 
[924] end if 
50 [925) end if 

[926] end for 

The basic concept behind the process described above and in FIG. 9 is to identify a consecutive sequence of input 
frames 115 that demonstrate very little change in content. If there is little or no change in the scene, the background 
image can be updated. In the approach described, change in content is identified by motion. If people are within the 
55 scene, their movement will cause the system 130 to not capture a new background. 

Beginning with step 901, the background is initialized to the background frame stored on disk, or other storage 
media (step 902). In step 903, FrameCount, which is the total number of images that have been processed meeting 
the motion constraint, is initialized to zero. In steps 904 and 905, PreviousFrame and PreviousFrameCount are used 
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to increase system sensitivity to motion and change. PreviousFrame is set to the initial input frame and PreviousFrame- 

C ° U The ;otSwo°vanab,es, coun.Thresho.d and previousThreshCd, are parameters that control how quick.y the 
svstlm wH acqu'e a new background. Coun.Threshold is the number of consecutive frames meeting the mot™ con- 
Snuha mus be processed before a new background will be acquired. A larger value for CountThreshold decreases 
t TCJS T2cU new backgrounds are acquired, i.e. increases the length of time the motion constraint must be 



met. 



PreviousThreshold determines at what frequency the system acquires a new 
Frame is acquired whenever the motion constraint is not met or after every Prev,ousthreshold frames has been proc 
o essed Since the motion constraint is evaluated by calculating the image difference of the previousFrame and current- 
Frame o"ev"ousTr eS hold can be used to configure the sensitivity of the system to mot™. In some situations, the 
mremeTo. pS.e mt be imperceptible between consecutive frames, ,e. very slow movement. However, when 
considered across a series of frames, motion can be easily identified ■ 

After the system is initialized, each input frame is processed beginning at steps 907 and 908. The l ame at tnis 

'^T^ZI^^— ,n 8 ,e P s 922 «S 923. ,a™Co„„, is ,«« and ,he .ac^undF^e 
25 is set equal to the current frame 

Segmentation Improvements 

A s Drevious | V described the difference threshold used in creating an image difference is a single integer number 
30 Hov^e^r^^^'have drawbacks First, during configuration the user must configure the best threshold or 
The zone This results in a configure, test, evaluate performance scenario, where the user must perform empirical 
testing To iden i y th best threshold. Th,s is not the most desirable approach Second, due to variations in lighting and 
backaround ^characteristics it may be difficult to identify a single threshold that works well over the entire looking zone 
430 or passin! I z"e 420. For example, background differencing in scenes w„h bright, reflective floors generally require 
35 a laraer difference threshold than a dark scene, due to the impact of shadows . ... , hlc 

One aroT^h S,al may be implemented is to have a different threshold for each pixel in the image With this 
' approach wan tage differences performed, the best threshold is used for each pixel in the zone Th,s is shown 
" in the image difference equation shown below for a zone of size nbym. 



40 



n m 



IS D(x,y) = f(I(x,y), B(x,y)) 



x=0y=0 



where 



ss 



f(i,b) = 0, if|i-b|<G(x,y) 
= l,if|i-b| G(x,y) 

Gfx vl is a two dimensional array that contains the threshold to be used for each pixel in the looking °r passing zone. 

Th th r^hods contained ,n G(x.y) may be automa.ica.ly calculated by the processing system 1 30 when a new 
background Tor the Corresponding zone is captured. The following equation may be used for generating the threshold 

for each x : y location 
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G(x : y) = -0.00000000000600 * k 6 + 0.0000000041 2652 * k 5 + 
:0.00000097234900 * k 4 +0.00008142579606 4 k° + 
5 9.99999999999988 

where k = B(x,y), the intensity value of pixel x,y in the current background image. G(x,y) is updated each time a new 

background image is acquired. The equation shown above has been derived by analyzing the amount of noise gen- 
io erated at various pixel intensities. In general, it has been found that higher intensity values are impacted more by noise, 

shadowing, and other environmental factors that create error. The equation above creates a larger threshold value for 

higher intensity pixel value locations in the scene. 

A second enhancement that can be used is to perform a noise elimination step on the image difference. This 

approach involves performing a series of morphological operations on the image difference. By performing a morpho- 
is logical erosion and dilation operation, noisy regions in the image difference can be reduced or eliminated. At the same 

time, areas containing objects that are fuzzy due to poor segmentation will be filled in. 

Configuration Tool for Looking Zone and Passing Zone 

20 In order to configure one or more looking zones 420 and one or more passing zones 430, a configuration tool may 

be implemented, as described below. In the preferred embodiment, the configuration tool may be a Visual Basic ap- 
plication operating on the processing system 130, that supports a graphical user interface. 

The process for configuring looking zones 420 and passing zones 430 is a follows. From the configuration tool, 
the user captures an image from the frame grabber card 120 that is stored in memory 135. The camera image may 

25 then be displayed on the PC display (not specifically shown, but inherent in PC 130). Looking zones 420 and passing 
zones 430 may be created by using a mouse input device, or any other suitable input device of PC 1 30. For example, 
to create a display (e.g., data structure 331 , etc.) for analysis, the user may select a "create a new display" option from 
a menu. All parameters associated with a display are now entered on a keyboard or other input device associated with 
the PC 130 (these parameters are described elsewhere with respect to FIG. 3). 

30 The user may now create the looking zones 420 and passing zones 430 (e.g., data structures 341 and 342) as- 

sociated with the display. A looking 420 zone may be created by selecting a "create a new looking zone" option (or 
other suitable option) from the menu. By "dragging" and "clicking" with a mouse, etc., the user may "draw" the location 
of the looking zone 420 on the image being displayed on the PC display 130. The bounding box coordinates are 
generated from the zone size and location on the display. After the zone is located and sized, the user may "double 

35 click" on the zone and enter via the keyboard all parameters associated with the looking zone (described previously 
with respect to FIG. 3). An analogous process is followed when creating a passing zone 430. 

After all zones are created and configured, the parameters-may then be saved to a storage device 1 50 (e.g.. hard 
disk drive, etc.) as a parameter file that is loaded by the present invention at initialization time. This configuration 
process may be followed for each display 331 to be monitored. 

40 

Claims 

1. A process for analyzing movement patterns in a localized area, characterizing the steps of: 

positioning a video camera ( i 1 0) so as to- view the localized area: 
capturing sequential images from the video camera at selected intervals in time: 
storing the sequential images in a memory (135): . 
detecting whether a person exists within the stored sequential images: 
50 sensing the time that any detected person exists in the stored images: and 

generating an event once the sensed time reaches a predetermined threshold. 

2. A process according to claim 1 comprising the steps of: 

55 dividing each stored sequential image into a looking zone and a passing zone: . 

detecting whether a person exists within looking zone: 
sensing the time that any detected person exists in the looking zone: 
generating a looking event once the second lime reaches a predetermined threshold: 
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delecting whether a person has passed through the passing zone: and 

generating a passing event if a person is detected as having passed through the passing zone. 

3 The process of cla.m 3. further comprising the step of analyzing the generated looking events and passing events 
5 ; * in order to perform statistical analysis on the movement patterns of the persons passing through the looking zone 
and passing zone. 

4. A system for analyzing movement patterns comprising: 

w a video camera (110) 

means (1 35) for storing an image from the video camera: and 

processing means (140) coupled to the video camera and the storing means arranged to: 
sequentially store images from the video camera in the storing means at selected intervals in time: 
detect whether a person exists within the sequential images stored in the storing means; 
is sense the amount of time that the person is detected as existing in the sequential images stored in the storing 

means: and 

generate an event once the sensed time reaches a predetermined threshold. 



The system of claim 4, wherein the storing means comprises: 

a frame grabber (-120) for capturing an image from the video camera; 
memory (1 35) for storing the image captured by the frame grabber: 

and wherein the processing means comprises a personal computer. 
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(54) System for analyzing movement patterns 

(57) A system and method are provided for detect- 
ing human movement patterns in and around a selected 
area using imaging techniques. A video camera 110 is 
positioned so as to view the area of interest (410), such 
as a promotional display, an automated teller machine 
(ATM), etc. The output (115) of the video camera is fed 
in real-time into a frame grabber (1 20), where the video 
image is repeatedly and sequentially digitized and 
stored in the memory (135) of a processing system 
(140). One or more passing zones and looking zones 



are defined for the video image. A passing zone (430) 
is defined as a zone of the video image through which 
a person would be located if "passing by" the area of 
interest a looking zone (420) is defined as a zone of the 
video image where a person would be located if "look- 
ing" at the area of interest and is often smaller in area 
than the passing zone; the processing system gener- 
ates passing and/or looking events. Data corresponding 
to the passing, and looking events may be stored for 
further processing and/or analysis. 
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